Skip to main content

Regular Expressions

SMTLIB2 standard The theory of unicode strings and regular expressions

The sort constructor RegEx takes as argument a sequence type. The set of regular expressions over strings is thus (RegEx String); it is synonymous with the sort RegLan defined in the SMTLIB2 format.

Summary of Operations

OperationBrief Description
(str.to.re s)Convert string to regular expression accepting s
(str.in.re s r)Determine if s is in the language generated by r
re.allcharThe regular expression accepting every string
re.nostrThe regular expression rejecting every string
(re.range ch1 ch2)The range of characters (represented as strings) between ch1 and ch2
(re.++ r1 r2 r3)Concatenation of regular expressions
(re.* r)Kleene star
(re.+ r)Kleene plus
(re.opt r)Zero or one use of r
((_ re.loop lo hi) r)from lo to hi number of repetitions of r
(re.union r1 r2)The union of regular languages
(re.inter r1 r2)The intersection of regular languages
(seq.to.re s)Convert sequence to regular expression accepting s
(seq.in.re s r)Determine if sequence s is in the language generated by r
(as re.all R)The regular expression of sort R accepting every sequence
(as re.empty R)The regular expression of sort R rejecting every sequence
(re.of.prop p)Sequences of length 1 where character satisfies predicate p. The sort of p is (Array C Bool), where C is the character sort.
(re.replace_re s r dst)Currently not supported: replace left-most smallest occurrence matching r in s by dst
(re.replace_re_all s r dst)Currently not supported: replace, traversing left-to-right, smallest matches, all occurrences matching r in s by dst

The re.range operator expects two strings each encoding a single character. For example (re.range "a" "\u{ff}") is a valid range of characters, while (re.range "aa" "") is the empty language.

For compatibility with the SMTLIB2 format Z3 also accepts expressions of the form (re.loop r lo hi). Z3 understands only the meaning of these terms when lo, hi are integer numerals.

What (not) to expect of regular expressions

The default solver for regular expressions unfolds membership relations of regular expressions lazily. It uses symbolic derivatives . This approach works for many membership and non-membership constraints, but is not a complete procedure when membership constraints are combined with constraints over strings. Note that the syntax allows forming symbolic regular expressions that contain uninterpreted non-terminals. It also does not handle regular expressions symbolic sequences (it allows to express non-regular languages). Thus, the string s in (str.to.re s) should be a string literal. You can write formulas with equalities over regular expressions. Z3 is a decision procedure for equalities and disequalities between non-symbolic regular expressions.

Examples

The maximal length is 6 for a string of length 2 repeated at most 3 times.